Genetic Programming for Document Segmentation and Region Classification Using Discipulus
نویسندگان
چکیده
Document segmentation is a method of rending the document into distinct regions. A document is an assortment of information and a standard mode of conveying information to others. Pursuance of data from documents involves ton of human effort, time intense and might severely prohibit the usage of data systems. So, automatic information pursuance from the document has become a big issue. It is been shown that document segmentation will facilitate to beat such problems. This paper proposes a new approach to segment and classify the document regions as text, image, drawings and table. Document image is divided into blocks using Run length smearing rule and features are extracted from every blocks. Discipulus tool has been used to construct the Genetic programming based classifier model and located 97.5% classification accuracy. Keywords—Document analysis; Information retrieval; Classification; Feature extraction; Document segmentation.
منابع مشابه
Document Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملGenetic Programming based DNA Microarray Analysis for Classification of Cancer
In this study the advantages of statistical gene selection are combined with the power of Genetic Programming (GP) to build classifiers for assigning gene expression microarray data samples to categories characteristic of certain cell states. To that end we implemented different statistical measures in a program called GENEACTIVATOR and tested their applicability to gene selection. Subsequently...
متن کاملGENETIC PROGRAMMING BASED DNA MICROARRAY ANALYSIS FOR CLASSIFICATION OF CANCER by
In this study the advantages of statistical gene selection are combined with the power of Genetic Programming (GP) to build classifiers for assigning gene expression microarray data samples to categories characteristic of certain cell states. To that end we implemented different statistical measures in a program called GENEACTIVATOR and tested their applicability to gene selection. Subsequently...
متن کاملTesting Discipulus Linear Genetic Programming Software on Real-world Environmental Engineering Challenges
Genetic Programming (GP) is a machine learning technique that writes computer programs, automatically. Although individual researchers used GP techniques in the 1960’s and 1970’s, GP emerged as a distinct discipline in 1992. Since that time, over one thousand academic studies have been published in the field and, in 1998, commercial GP software – Discipulus – reached the market. Discipulus is a...
متن کاملComparison of DiscipulusTM Linear Genetic Programming Soft- ware with Support Vector Machines, Classification Trees, Neural Networks and Human Experts
DiscipulusTM is multiple-run, linear, genetic-programming software. Various versions have been available commercially since 1998 (see, www.aimlearning.com). Discipulus creates models directly from data, like neural networks or support vector machines. This white paper reports on the result of a multi-year study of the performance of Discipulus by Science Applications International Corp (SAIC) a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1303.0460 شماره
صفحات -
تاریخ انتشار 2013